-
Notifications
You must be signed in to change notification settings - Fork 0
Support find similar search #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s into postgresql-server
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for "find similar" search functionality, allowing text and vector-based similarity queries in ApertureDB through SQL. The implementation enables queries like SELECT * FROM "crawl-to-rag" WHERE _find_similar = FIND_SIMILAR(text:='find entity', k:= 10) AND _blobs.
Key changes include:
- Implemented find similar search with text embedding support
- Refactored query execution to use callback-based configuration instead of direct parameter passing
- Added special
_blobsparameter to control blob return behavior instead of automatic inclusion inSELECT *
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| base/docker/scripts/embeddings/embeddings.py | Added check_properties method to validate embedder properties |
| apps/sql-server/fdw/fdw/table.py | New module defining TableOptions with callback support for command modification |
| apps/sql-server/fdw/fdw/system.py | Refactored system table creation with new callback architecture |
| apps/sql-server/fdw/fdw/entity.py | Updated entity table creation to use new TableOptions structure |
| apps/sql-server/fdw/fdw/descriptor.py | Added find similar functionality with embedding support and vector operations |
| apps/sql-server/fdw/fdw/connection.py | Updated connection table creation for new architecture |
| apps/sql-server/fdw/fdw/common.py | Introduced Curry class for serializable callbacks and removed old table/column options |
| apps/sql-server/fdw/fdw/column.py | New module with ColumnOptions and blob handling utilities |
| apps/sql-server/fdw/fdw/init.py | Major refactor of FDW execution logic with callback-based query building |
| apps/sql-server/app/sql/functions.sql | Added FIND_SIMILAR SQL function for similarity queries |
| apps/sql-server/Dockerfile | Updated to install embeddings dependencies and use custom multicorn2 branch |
| apps/rag/Dockerfile | Added embeddings dependencies installation |
| apps/crawl-to-rag/Dockerfile | Added embeddings dependencies installation |
Comments suppressed due to low confidence (1)
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
drewaogle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
|
Docker images for version v2025.8.5 were built and pushed after this PR was merged. View workflow run |
The key feature added by this PR is to support queries of the form:
Here we are able to take a text string, embed it using the same model as was used to create a descriptor set, and pass it to the
FindDescriptorcall.This PR also includes a re-factoring of the way we control the behaviour of the query execution. The control is now rooted in the configuration supplied in
import_schemaand theexecutemethod is now a much simpler beast, that uses callback functions. This was a little tricky to do as the configuration has to be communicated via text strings to a different invocation environment, which made it hard to pass function pointers.I have also added a special boolean parameter
_blobsfor any table that can return blobs, corresponding to theblobsparameter. This means that blobs are no longer returned just because the user saysSELECT *. Note that making_blobswork required me to add some features to the underlying multicorn2 library. See pgsql-io/multicorn2#78I also added a rudimentary EXPLAIN feature that shows the AQL in the SQL interface.